Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
1.
Hum Genet ; 2024 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-38227011

RESUMEN

Missense mutations are known contributors to diverse genetic disorders, due to their subtle, single amino acid changes imparted on the resultant protein. Because of this, understanding the impact of these mutations on protein stability and function is crucial for unravelling disease mechanisms and developing targeted therapies. The Critical Assessment of Genome Interpretation (CAGI) provides a valuable platform for benchmarking state-of-the-art computational methods in predicting the impact of disease-related mutations on protein thermodynamics. Here we report the performance of our comprehensive platform of structure-based computational approaches to evaluate mutations impacting protein structure and function on 3 challenges from CAGI6: Calmodulin, MAPK1 and MAPK3. Our stability predictors have achieved correlations of up to 0.74 and AUCs of 1 when predicting changes in ΔΔG for MAPK1 and MAPK3, respectively, and AUC of up to 0.75 in the Calmodulin challenge. Overall, our study highlights the importance of structure-based approaches in understanding the effects of missense mutations on protein thermodynamics. The results obtained from the CAGI6 challenges contribute to the ongoing efforts to enhance our understanding of disease mechanisms and facilitate the development of personalised medicine approaches.

2.
Proteins ; 2023 Oct 23.
Artículo en Inglés | MEDLINE | ID: mdl-37870486

RESUMEN

Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.

3.
Nucleic Acids Res ; 51(W1): W122-W128, 2023 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-37283042

RESUMEN

Understanding the effects of mutations on protein stability is crucial for variant interpretation and prioritisation, protein engineering, and biotechnology. Despite significant efforts, community assessments of predictive tools have highlighted ongoing limitations, including computational time, low predictive power, and biased predictions towards destabilising mutations. To fill this gap, we developed DDMut, a fast and accurate siamese network to predict changes in Gibbs Free Energy upon single and multiple point mutations, leveraging both forward and hypothetical reverse mutations to account for model anti-symmetry. Deep learning models were built by integrating graph-based representations of the localised 3D environment, with convolutional layers and transformer encoders. This combination better captured the distance patterns between atoms by extracting both short-range and long-range interactions. DDMut achieved Pearson's correlations of up to 0.70 (RMSE: 1.37 kcal/mol) on single point mutations, and 0.70 (RMSE: 1.84 kcal/mol) on double/triple mutants, outperforming most available methods across non-redundant blind test sets. Importantly, DDMut was highly scalable and demonstrated anti-symmetric performance on both destabilising and stabilising mutations. We believe DDMut will be a useful platform to better understand the functional consequences of mutations, and guide rational protein engineering. DDMut is freely available as a web server and API at https://biosig.lab.uq.edu.au/ddmut.


Asunto(s)
Aprendizaje Profundo , Estabilidad Proteica , Proteínas , Programas Informáticos , Mutación , Mutación Puntual , Proteínas/química , Proteínas/genética
4.
Bioinformatics ; 39(7)2023 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-37382560

RESUMEN

MOTIVATION: With the development of sequencing techniques, the discovery of new proteins significantly exceeds the human capacity and resources for experimentally characterizing protein functions. Localization, EC numbers, and GO terms with the structure-based Cutoff Scanning Matrix (LEGO-CSM) is a comprehensive web-based resource that fills this gap by leveraging the well-established and robust graph-based signatures to supervised learning models using both protein sequence and structure information to accurately model protein function in terms of Subcellular Localization, Enzyme Commission (EC) numbers, and Gene Ontology (GO) terms. RESULTS: We show our models perform as well as or better than alternative approaches, achieving area under the receiver operating characteristic curve of up to 0.93 for subcellular localization, up to 0.93 for EC, and up to 0.81 for GO terms on independent blind tests. AVAILABILITY AND IMPLEMENTATION: LEGO-CSM's web server is freely available at https://biosig.lab.uq.edu.au/lego_csm. In addition, all datasets used to train and test LEGO-CSM's models can be downloaded at https://biosig.lab.uq.edu.au/lego_csm/data.


Asunto(s)
Proteínas , Programas Informáticos , Humanos , Proteínas/química
5.
Pharmaceutics ; 15(2)2023 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-36839752

RESUMEN

Biologics are one of the most rapidly expanding classes of therapeutics, but can be associated with a range of toxic properties. In small-molecule drug development, early identification of potential toxicity led to a significant reduction in clinical trial failures, however we currently lack robust qualitative rules or predictive tools for peptide- and protein-based biologics. To address this, we have manually curated the largest set of high-quality experimental data on peptide and protein toxicities, and developed CSM-Toxin, a novel in-silico protein toxicity classifier, which relies solely on the protein primary sequence. Our approach encodes the protein sequence information using a deep learning natural languages model to understand "biological" language, where residues are treated as words and protein sequences as sentences. The CSM-Toxin was able to accurately identify peptides and proteins with potential toxicity, achieving an MCC of up to 0.66 across both cross-validation and multiple non-redundant blind tests, outperforming other methods and highlighting the robust and generalisable performance of our model. We strongly believe the CSM-Toxin will serve as a valuable platform to minimise potential toxicity in the biologic development pipeline. Our method is freely available as an easy-to-use webserver.

6.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36484688

RESUMEN

MOTIVATION: Over 300 000 protein-protein interaction (PPI) pairs have been identified in the human proteome and targeting these is fast becoming the next frontier in drug design. Predicting PPI sites, however, is a challenging task that traditionally requires computationally expensive and time-consuming docking simulations. A major weakness of modern protein docking algorithms is the inability to account for protein flexibility, which ultimately leads to relatively poor results. RESULTS: Here, we propose DockNet, an efficient Siamese graph-based neural network method which predicts contact residues between two interacting proteins. Unlike other methods that only utilize a protein's surface or treat the protein structure as a rigid body, DockNet incorporates the entire protein structure and places no limits on protein flexibility during an interaction. Predictions are modeled at the residue level, based on a diverse set of input node features including residue type, surface accessibility, residue depth, secondary structure, pharmacophore and torsional angles. DockNet is comparable to current state-of-the-art methods, achieving an area under the curve (AUC) value of up to 0.84 on an independent test set (DB5), can be applied to a variety of different protein structures and can be utilized in situations where accurate unbound protein structures cannot be obtained. AVAILABILITY AND IMPLEMENTATION: DockNet is available at https://github.com/npwilliams09/docknet and an easy-to-use webserver at https://biosig.lab.uq.edu.au/docknet. All other data underlying this article are available in the article and in its online supplementary material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Humanos , Proteoma , Farmacóforo , Área Bajo la Curva , Biología Computacional
7.
Nat Struct Mol Biol ; 29(11): 1056-1067, 2022 11.
Artículo en Inglés | MEDLINE | ID: mdl-36344848

RESUMEN

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.


Asunto(s)
Biología Computacional , Furilfuramida , Biología Computacional/métodos , Sitios de Unión , Proteínas/química , Bases de Datos de Proteínas , Conformación Proteica
8.
Protein Sci ; 31(11): e4453, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36305769

RESUMEN

Protein phosphorylation acts as an essential on/off switch in many cellular signaling pathways. This has led to ongoing interest in targeting kinases for therapeutic intervention. Computer-aided drug discovery has been proven a useful and cost-effective approach for facilitating prioritization and enrichment of screening libraries, but limited effort has been devoted providing insights on what makes a potent kinase inhibitor. To fill this gap, here we developed kinCSM, an integrative computational tool capable of accurately identifying potent cyclin-dependent kinase 2 (CDK2) inhibitors, quantitatively predicting CDK2 ligand-kinase inhibition constants (pKi ) and classifying different types of inhibitors based on their favorable binding modes. kinCSM predictive models were built using supervised learning and leveraged the concept of graph-based signatures to capture both physicochemical properties and geometry properties of small molecules. CDK2 inhibitors were accurately identified with Matthew's Correlation Coefficients (MCC) of up to 0.74, and inhibition constants predicted with Pearson's correlation of up to 0.76, both with consistent performances of 0.66 and 0.68 on a nonredundant blind test, respectively. kinCSM was also able to identify the potential type of inhibition for a given molecule, achieving MCC of up to 0.80 on cross-validation and 0.73 on the blind test. Analyzing the molecular composition of revealed enriched chemical fragments in CDK2 inhibitors and different types of inhibitors, which provides insights into the molecular mechanisms behind ligand-kinase interactions. kinCSM will be an invaluable tool to guide future kinase drug discovery. To aid the fast and accurate screening of CDK2 inhibitors, kinCSM is freely available at https://biosig.lab.uq.edu.au/kin_csm/.


Asunto(s)
Antineoplásicos , Inhibidores de Proteínas Quinasas , Quinasa 2 Dependiente de la Ciclina/química , Ligandos , Inhibidores de Proteínas Quinasas/farmacología , Inhibidores de Proteínas Quinasas/química , Descubrimiento de Drogas , Antineoplásicos/química
9.
Curr Res Struct Biol ; 4: 271-277, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36118553

RESUMEN

Alkaptonuria (AKU), a rare genetic disorder, is characterized by the accumulation of homogentisic acid (HGA) in the body. Affected individuals lack functional levels of an enzyme required to breakdown HGA. Mutations in the homogentisate 1,2-dioxygenase (HGD) gene cause AKU and they are responsible for deficient levels of functional HGD, which, in turn, leads to excess levels of HGA. Although HGA is rapidly cleared from the body by the kidneys, in the long term it starts accumulating in various tissues, especially cartilage. Over time (rarely before adulthood), it eventually changes the color of affected tissue to slate blue or black. Here we report a comprehensive mutation analysis of 111 pathogenic and 190 non-pathogenic HGD missense mutations using protein structural information. Using our comprehensive suite of graph-based signature methods, mCSM complemented with sequence-based tools, we studied the functional and molecular consequences of each mutation on protein stability, interaction and evolutionary conservation. The scores generated from the structure and sequence-based tools were used to train a supervised machine learning algorithm with 89% accuracy. The empirical classifier was used to generate the variant phenotype for novel HGD missense mutations. All this information is deployed as a user friendly freely available web server called HGDiscovery (https://biosig.lab.uq.edu.au/hgdiscovery/).

10.
Protein Sci ; 31(10): e4442, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36173168

RESUMEN

Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data-driven computational approaches. Here we propose CSM-peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti-angiogenic, anti-bacterial, anti-cancer, anti-inflammatory, anti-viral, cell-penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross-validation. We anticipate CSM-peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user-friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.


Asunto(s)
Péptidos , Programas Informáticos , Antiinflamatorios , Biología Computacional/métodos , Aprendizaje Automático , Péptidos/química
11.
Brief Bioinform ; 23(4)2022 07 18.
Artículo en Inglés | MEDLINE | ID: mdl-35656714

RESUMEN

Proteins are capable of highly specific interactions and are responsible for a wide range of functions, making them attractive in the pursuit of new therapeutic options. Previous studies focusing on overall geometry of protein-protein interfaces, however, concluded that PPI interfaces were generally flat. More recently, this idea has been challenged by their structural and thermodynamic characterisation, suggesting the existence of concave binding sites that are closer in character to traditional small-molecule binding sites, rather than exhibiting complete flatness. Here, we present a large-scale analysis of binding geometry and physicochemical properties of all protein-protein interfaces available in the Protein Data Bank. In this review, we provide a comprehensive overview of the protein-protein interface landscape, including evidence that even for overall larger, more flat interfaces that utilize discontinuous interacting regions, small and potentially druggable pockets are utilized at binding sites.


Asunto(s)
Proteínas , Sitios de Unión , Bases de Datos de Proteínas , Unión Proteica , Proteínas/química
12.
Nucleic Acids Res ; 50(W1): W204-W209, 2022 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-35609999

RESUMEN

Recent advances in protein structural modelling have enabled the accurate prediction of the holo 3D structures of almost any protein, however protein function is intrinsically linked to the interactions it makes. While a number of computational approaches have been proposed to explore potential biological interactions, they have been limited to specific interactions, and have not been readily accessible for non-experts or use in bioinformatics pipelines. Here we present CSM-Potential, a geometric deep learning approach to identify regions of a protein surface that are likely to mediate protein-protein and protein-ligand interactions in order to provide a link between 3D structure and biological function. Our method has shown robust performance, outperforming existing methods for both predictive tasks. By assessing the performance of CSM-Potential on independent blind tests, we show that our method was able to achieve ROC AUC values of up to 0.81 for the identification of potential protein-protein binding sites, and up to 0.96 accuracy on biological ligand classification. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/csm_potential.


Asunto(s)
Aprendizaje Profundo , Mapeo de Interacción de Proteínas , Programas Informáticos , Sitios de Unión , Ligandos , Proteínas de la Membrana , Mapeo de Interacción de Proteínas/métodos , Conformación Proteica
13.
Front Cell Dev Biol ; 10: 786268, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35300415

RESUMEN

Mitochondria are complex organelles containing 13 proteins encoded by mitochondrial DNA and over 1,000 proteins encoded on nuclear DNA. Many mitochondrial proteins are associated with the inner or outer mitochondrial membranes, either peripherally or as integral membrane proteins, while others reside in either of the two soluble mitochondrial compartments, the mitochondrial matrix and the intermembrane space. The biogenesis of the five complexes of the oxidative phosphorylation system are exemplars of this complexity. These large multi-subunit complexes are comprised of more than 80 proteins with both membrane integral and peripheral associations and require soluble, membrane integral and peripherally associated assembly factor proteins for their biogenesis. Mutations causing human mitochondrial disease can lead to defective complex assembly due to the loss or altered function of the affected protein and subsequent destabilization of its interactors. Here we couple sodium carbonate extraction with quantitative mass spectrometry (SCE-MS) to track changes in the membrane association of the mitochondrial proteome across multiple human knockout cell lines. In addition to identifying the membrane association status of over 840 human mitochondrial proteins, we show how SCE-MS can be used to understand the impacts of defective complex assembly on protein solubility, giving insights into how specific subunits and sub-complexes become destabilized.

14.
J Chem Inf Model ; 61(11): 5438-5445, 2021 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-34719929

RESUMEN

Protein-protein interactions are promising sites for development of selective drugs; however, they have generally been viewed as challenging targets. Molecules targeting protein-protein interactions tend to be larger and more lipophilic than other drug-like molecules, mimicking the properties of interacting interfaces. Here, we propose a machine learning approach that uses a graph-based representation of small molecules to guide identification of inhibitors modulating protein-protein interactions, pdCSM-PPI. This approach was applied to 21 different PPI targets. We developed interaction-specific models that were able to accurately identify active compounds achieving MCC and F1 scores up to 1, and Pearson's correlations up to 0.87, outperforming previous approaches. Using insights from these individual models, we developed a generic protein-protein interaction modulator predictive model, which accurately predicted IC50 with a Pearson's correlation of 0.64 on a low redundancy blind test. Importantly, we were able to accurately identify active from inactive compounds, achieving an AUC of 0.77 and sensitivity and specificity of 76% and 78%, respectively. We believe pdCSM-PPI will be an important tool to help guide more efficient screening of new PPI inhibitors; it is freely available as an easy-to-use web server and API at http://biosig.unimelb.edu.au/pdcsm_ppi.


Asunto(s)
Aprendizaje Automático , Programas Informáticos , Mapeo de Interacción de Proteínas
15.
Comput Struct Biotechnol J ; 19: 5381-5391, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34667533

RESUMEN

Kinases play crucial roles in cellular signalling and biological processes with their dysregulation associated with diseases, including cancers. Kinase inhibitors, most notably those targeting ABeLson 1 (ABL1) kinase in chronic myeloid leukemia, have had a significant impact on cancer survival, yet emergence of resistance mutations can reduce their effectiveness, leading to therapeutic failure. Limited effort, however, has been devoted to developing tools to accurately identify ABL1 resistance mutations, as well as providing insights into their molecular mechanisms. Here we investigated the structural basis of ABL1 mutations modulating binding affinity of eight FDA-approved drugs. We found mutations impair affinity of type I and type II inhibitors differently and used this insight to developed a novel web-based diagnostic tool, SUSPECT-ABL, to pre-emptively predict resistance profiles and binding free-energy changes (ΔΔG) of all possible ABL1 mutations against inhibitors with different binding modes. Resistance mutations in ABL1 were successfully identified, achieving a Matthew's Correlation Coefficient of up to 0.73 and the resulting change in ligand binding affinity with a Pearson's correlation of up to 0.77, with performances consistent across non-redundant blind tests. Through an in silico saturation mutagenesis, our tool has identified possibly emerging resistance mutations, which offers opportunities for in vivo experimental validation. We believe SUSPECT-ABL will be an important tool not just for improving precision medicine efforts, but for facilitating the development of next-generation inhibitors that are less prone to resistance. We have made our tool freely available at http://biosig.unimelb.edu.au/suspect_abl/.

16.
JACC Cardiovasc Imaging ; 14(10): 1904-1915, 2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34147443

RESUMEN

OBJECTIVES: The purpose of this study was to identify whether machine learning from processing of continuous wave transforms (CWTs) to provide an "energy waveform" electrocardiogram (ewECG) could be integrated with echocardiographic assessment of subclinical systolic and diastolic left ventricular dysfunction (LVD). BACKGROUND: Asymptomatic LVD has management implications, but routine echocardiography is not undertaken in subjects at risk of heart failure. Signal processing of the surface ECG with the use of CWT can identify abnormal myocardial relaxation. METHODS: EwECG and echocardiography were undertaken in 398 participants at risk of heart failure (HF). Reduced global longitudinal strain (GLS ≤16%)), diastolic abnormalities (E/e' >15, left atrial enlargement with E/e' >10 or impaired relaxation) or LV hypertrophy defined LVD. EwECG feature selection and supervised machine-learning by random forest (RF) classifier was undertaken with 643 CWT-derived features and the ARIC (Atherosclerosis Risk In Communities) heart failure risk score. RESULTS: The ARIC score and 18 CWT features were selected to build a RF predictive model for LVD in a training dataset (n = 287; 60% female, median age 71 [interquartile range: 68 to 74] years). Model performance was tested in an independent group (n = 111; 49% female, median age 61 years [59 to 66 years]), demonstrating 85% sensitivity and 72% specificity (area under the receiver-operating characteristic curve [AUC]: 0.83; 95% confidence interval [CI]: 0.74 to 0.92). With ARIC score removed, sensitivity was 88% and specificity, 70% (AUC: 0.78; 95% CI: 0.70 to 0.86). RF models for reduced GLS and diastolic abnormalities including similar features had sensitivities that were unsuitable for screening. Conventional candidates for LVD screening (ARIC score, N-terminal pro-B-type natriuretic peptide, and standard automated ECG analysis) had inferior discriminative ability. Integration of ewECG in screening of people at risk of HF would reduce need for echocardiography by 45% while missing 12% of LVD cases. CONCLUSIONS: Machine learning applied to ewECG is a sensitive screening test for LVD, and its integration into screening of patients at risk for HF would reduce the number of echocardiograms by almost one-half.


Asunto(s)
Disfunción Ventricular Izquierda , Anciano , Ecocardiografía , Electrocardiografía , Femenino , Humanos , Aprendizaje Automático , Masculino , Persona de Mediana Edad , Valor Predictivo de las Pruebas , Disfunción Ventricular Izquierda/diagnóstico por imagen
17.
Nucleic Acids Res ; 49(W1): W438-W445, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-34050760

RESUMEN

The identification of disease-causal variants is non-trivial. By mapping population variation from over 448,000 exome and genome sequences to over 81,000 experimental structures and homology models of the human proteome, we have calculated both regional intolerance to missense variation (Missense Tolerance Ratio, MTR), using a sliding window of 21-41 codons, and introduce a new 3D spatial intolerance to missense variation score (3D Missense Tolerance Ratio, MTR3D), using spheres of 5-8 Å. We show that the MTR3D is less biased by regions with limited data and more accurately identifies regions under purifying selection than estimates relying on the sequence alone. Intolerant regions were highly enriched for both ClinVar pathogenic and COSMIC somatic missense variants (Mann-Whitney U test P < 2.2 × 10-16). Further, we combine sequence- and spatial-based scores to generate a consensus score, MTRX, which distinguishes pathogenic from benign variants more accurately than either score separately (AUC = 0.85). The MTR3D server enables easy visualisation of population variation, MTR, MTR3D and MTRX scores across the entire gene and protein structure for >17,000 human genes and >42,000 alternative alternate transcripts, including both Ensembl and RefSeq transcripts. MTR3D is freely available by user-friendly web-interface and API at http://biosig.unimelb.edu.au/mtr3d/.


Asunto(s)
Mutación Missense , Estructura Terciaria de Proteína/genética , Programas Informáticos , Genómica , Humanos , Neoplasias/genética , Homología Estructural de Proteína
18.
Nucleic Acids Res ; 49(W1): W417-W424, 2021 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-33893812

RESUMEN

Protein-protein interactions play a crucial role in all cellular functions and biological processes and mutations leading to their disruption are enriched in many diseases. While a number of computational methods to assess the effects of variants on protein-protein binding affinity have been proposed, they are in general limited to the analysis of single point mutations and have been shown to perform poorly on independent test sets. Here, we present mmCSM-PPI, a scalable and effective machine learning model for accurately assessing changes in protein-protein binding affinity caused by single and multiple missense mutations. We expanded our well-established graph-based signatures in order to capture physicochemical and geometrical properties of multiple wild-type residue environments and integrated them with substitution scores and dynamics terms from normal mode analysis. mmCSM-PPI was able to achieve a Pearson's correlation of up to 0.75 (RMSE = 1.64 kcal/mol) under 10-fold cross-validation and 0.70 (RMSE = 2.06 kcal/mol) on a non-redundant blind test, outperforming existing methods. Our method is freely available as a user-friendly and easy-to-use web server and API at http://biosig.unimelb.edu.au/mmcsm_ppi.


Asunto(s)
Mutación Missense , Mutación Puntual , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Aprendizaje Automático
20.
Methods Mol Biol ; 2190: 1-32, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32804359

RESUMEN

Mutations in protein-coding regions can lead to large biological changes and are associated with genetic conditions, including cancers and Mendelian diseases, as well as drug resistance. Although whole genome and exome sequencing help to elucidate potential genotype-phenotype correlations, there is a large gap between the identification of new variants and deciphering their molecular consequences. A comprehensive understanding of these mechanistic consequences is crucial to better understand and treat diseases in a more personalized and effective way. This is particularly relevant considering estimates that over 80% of mutations associated with a disease are incorrectly assumed to be causative. A thorough analysis of potential effects of mutations is required to correctly identify the molecular mechanisms of disease and enable the distinction between disease-causing and non-disease-causing variation within a gene. Here we present an overview of our integrative mutation analysis platform, which focuses on refining the current genotype-phenotype correlation methods by using the wealth of protein structural information.


Asunto(s)
Análisis Mutacional de ADN/métodos , Estudios de Asociación Genética/métodos , Mutación/genética , Exoma/genética , Genotipo , Humanos , Fenotipo , Secuenciación del Exoma/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...